499 research outputs found

    Memory-Efficient Topic Modeling

    Full text link
    As one of the simplest probabilistic topic modeling techniques, latent Dirichlet allocation (LDA) has found many important applications in text mining, computer vision and computational biology. Recent training algorithms for LDA can be interpreted within a unified message passing framework. However, message passing requires storing previous messages with a large amount of memory space, increasing linearly with the number of documents or the number of topics. Therefore, the high memory usage is often a major problem for topic modeling of massive corpora containing a large number of topics. To reduce the space complexity, we propose a novel algorithm without storing previous messages for training LDA: tiny belief propagation (TBP). The basic idea of TBP relates the message passing algorithms with the non-negative matrix factorization (NMF) algorithms, which absorb the message updating into the message passing process, and thus avoid storing previous messages. Experimental results on four large data sets confirm that TBP performs comparably well or even better than current state-of-the-art training algorithms for LDA but with a much less memory consumption. TBP can do topic modeling when massive corpora cannot fit in the computer memory, for example, extracting thematic topics from 7 GB PUBMED corpora on a common desktop computer with 2GB memory.Comment: 20 pages, 7 figure

    A New Approach to Speeding Up Topic Modeling

    Full text link
    Latent Dirichlet allocation (LDA) is a widely-used probabilistic topic modeling paradigm, and recently finds many applications in computer vision and computational biology. In this paper, we propose a fast and accurate batch algorithm, active belief propagation (ABP), for training LDA. Usually batch LDA algorithms require repeated scanning of the entire corpus and searching the complete topic space. To process massive corpora having a large number of topics, the training iteration of batch LDA algorithms is often inefficient and time-consuming. To accelerate the training speed, ABP actively scans the subset of corpus and searches the subset of topic space for topic modeling, therefore saves enormous training time in each iteration. To ensure accuracy, ABP selects only those documents and topics that contribute to the largest residuals within the residual belief propagation (RBP) framework. On four real-world corpora, ABP performs around 1010 to 100100 times faster than state-of-the-art batch LDA algorithms with a comparable topic modeling accuracy.Comment: 14 pages, 12 figure

    Deep Learning the Effects of Photon Sensors on the Event Reconstruction Performance in an Antineutrino Detector

    Full text link
    We provide a fast approach incorporating the usage of deep learning for evaluating the effects of photon sensors in an antineutrino detector on the event reconstruction performance therein. This work is an attempt to harness the power of deep learning for detector designing and upgrade planning. Using the Daya Bay detector as a benchmark case and the vertex reconstruction performance as the objective for the deep neural network, we find that the photomultiplier tubes (PMTs) have different relative importance to the vertex reconstruction. More importantly, the vertex position resolutions for the Daya Bay detector follow approximately a multi-exponential relationship with respect to the number of PMTs and hence, the coverage. This could also assist in deciding on the merits of installing additional PMTs for future detector plans. The approach could easily be used with other objectives in place of vertex reconstruction

    catena-Poly[[[[N′-(4-cyano­benzyl­idene)nicotinohydrazide]silver(I)]-μ-[N′-4-cyano­benzyl­idene)nicotinohydrazide]] hexa­fluoridoarsenate]

    Get PDF
    In the title compound, {[Ag(C14H10N4O)2]AsF6}n, the AgI ion is coordinated by two N atoms from two different pyridyl rings and one N atom from one carbonitrile group of three different N′-(4-cyano­benzyl­idene)nicotinohydrazide ligands in a distorted T-shaped geometry. The Ag—Ncarbonitrile bond distance is significant longer than those of Ag—Npyrid­yl. The bond angles around the AgI atom are also not in line with those in an ideal T-shaped geometry. One type of ligand acts as the bridge that connects AgI atoms into chains along [01]. These chains are linked to each other via N—H⋯O hydrogen bonds and Ag⋯O inter­actions with an Ag⋯O separation of 2.869 (2) Å. In addition, the [AsF6]− counter-anions are linked to the hydrazone groups through N—H⋯F hydrogen bonds. Four of the F atoms of the [AsF6]− anion are disordered over two sets of sites with occupancies of 0.732 (9) and 0.268 (9)
    • …
    corecore